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METHOD OF INDEXING AND IDENTIFYING MULTIMEDIA DOCUMENTS 



The present invention relates to methods of indexing and 
identifying multimedia documents. 
5 From a general point of view, identifying a multimedia 

document comprises two stages: 

• a so-called "indexing" stage during which attempts are 
made to characterize each document of a previously . recorded 
database using a finite number of parameters that can 

10 subsequently be stored and manipulated easily; and 

• a so-called "search" stage in which following a request 
made by a user, e.g. to identify a query image, a search is 
made for all multimedia documents that are similar or that 
satisfy the request. 

15 Numerous methods already exist for indexing images that 

rely on extracting shape attributes from objects making up the 
image, if any, together with attributes for the texture or the 
background, of the image. 

Nevertheless, known methods apply in fields that are very 

20 specialized or that involve processing a very large amount of 
information, thereby leading to complexity and slowness in 
processing the information. 

The present invention seeks to remedy the above-mentioned 
drawbacks and to provide a method of general application for 

25 indexing and identifying multimedia documents, the method 
rationalizing the processing and leading to processing times 
that are much shorter, while increasing the quality of the 
results and their reliability, thus making it possible in 
particular to proceed with effective searches based on 

3 0 content. 

In accordance with the invention, these aims are achieved 
by a method of indexing multimedia documents, the method being 
characterized in that it comprises the following steps: 

a) for each document, identifying and extracting terms t ± 
35 constituted by vectors characterizing properties of the 
multimedia document for indexing, such as shape, texture, 
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color, or structure of an image, the energy, the oscillation 
rate or frequency information of an audio signal, or a group 
of characters of a text; 

b) storing the terms t ± characterizing the properties of 
5 the multimedia document in a term base comprising P terms; 

c) determining a maximum number N of desired concepts 
combining the most pertinent terms t ± , where N is an integer 
less than P, with each concept c ± being designed to combine all 
terms that are neighboring from the point of view of their 

10 characteristics ; 

d) calculating the matrix T of distances between the 
terms t ± of the term base; 

e) decomposing the set P of terms ti of the term base into 
N portions Pj such that P = P x u P 2 ... kj Pj ... kj p n/ each portion 

15 Pj comprising a set of terms t ±j and being represented by a 
concept Cj, the terms t ± being distributed in such a manner 
that terms that are farther away are to be found in distinct 
portions P 1# P m while terms that are closer together are to be 
found in the same portion P x ; 

20 f) structuring a concept dictionary so as to constitute a 

binary tree in which the leaves contain the concepts c ± of the 
dictionary and the nodes of the tree contain the information 
necessary for scanning the tree during a stage of identifying 
a document by comparing it with previously- indexed documents; 

25 and 

g) constructing a fingerprint base made up of the set of 
concepts c ± representing the terms t ± of the documents to be 
indexed, each document being associated with a fingerprint 
that is specific thereto. 

3 0 More particularly, each concept c ± of the fingerprint base 

is associated with a data set comprising the number of terms 
No. T in the documents in which the concept c ± is present. 

In a particular aspect of the invention, for each 
document in which a concept c ± is present, a fingerprint of the 

3 5 concept c ± is registered in the document, said fingerprint 
containing the frequency with which the concept c ± occurs, the 



-2- 



identities of concepts neighboring the concept c ± in the 
document, and a score which is a mean value of similarity 
measurements between the concept c ± and the terms t ± of the 
document that are the closest to the concept c^ 
5 Advantageously, the method of the invention comprises a 

step of optimizing the partitioning of the set P of terms of 
the term base to decompose said set P into M classes C ± (1 < i 
< M, where M < P) , so as to reduce the distribution error of 
the set P of terms in the term base into N portions (P 1# P 2/ 
10 P N ) where each portion P t is represented by the term t ± that is 
taken as the concept c it the error that is committed 6 being 

N 

such that e = ^£t. where e ti = S d2 ( t i' t j) ^ s the error committed by 

i=l 1 tjGPi 

replacing the terms tj of a portion P ± with t ± . 

Under such circumstances, the method may comprise the 
15 following steps: 

i) decomposing the set P of terms into two portions P x and 

ii) determining the two terms t ± and tj of the set P that 
are the furthest apart, corresponding to the greatest distance 

20 of the distance matrix T; 

iii) for each term t k of the set P, examining to see 
whether the distance D ki between the term t k of the term t ± is 
less than the distance D kj between the term t k and the term t j , 
and if so, allocating the term t k to the portion P 1# and 

2 5 otherwise allocating the term t k to the portion P 2 ; and 

iv) iterating step i) until the desired number N of 
portions P ± has been obtained, and on each iteration applying 
the steps ii) and iii) on the terms of the portions P x and P 2 . 

The method of the invention may be characterized more 
30 particularly in that it includes optimization starting from N 
disjoint portions {P lf P 2 , P N } of the set P and N terms {t lf 

t 2 , t N } representing them in order to reduce the 

decomposition error of the set P into N portions, and in that 
it comprises the following steps: 
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i) calculating the centers of gravity C ± of the portions 

p. • 

ii) calculating errors eQ = ^d 2 (C i/ t j ) and et| = ^d 2 (t.,t.) 

tjePi tjePi 

when replacing the terms tj of the portion Pi respectively by 
5 C ± and by t ± ; 

iii) comparing et ± and zc ± and replacing t ± by C ± if sc ± < 
st ± ; and 

iv) calculating a new distance matrix T between the terms 
t ± of the term base and the process of decomposing the set P of 

10 terms of the term base into N portions, unless a stop 
condition is satisfied with. 

gc t -ec t+1 < threshold/ 
ec t 

where ec t represents the error committed at instant t. 

In order to facilitate searching and identifying 

15 documents, for the purpose of structuring the concept 
dictionary, a navigation chart is produced iteratively on each 
iteration, beginning by splitting the set of concepts into two 
subsets, and then selecting one subset on each iteration until 
the desired number of groups is obtained or until a stop 

20 criterion is satisfied. 

The stop criterion may be constituted by the fact that 
the subsets obtained are all homogeneous with small standard 
deviation. 

More particularly, during the structuring of the concept 
25 dictionary, navigation indicators are determined from a matrix 
M = tc 1# c 2 , c N ] g <R P * N of the set C of concepts c ± e <R P , 

where c ± represents a concept of p values, by implementing the 
following steps: 

i) calculating a representative w of the matrix M; 
3 0 ii) calculating the covariance matrix M between the 

elements of the matrix M and the representative w of the 
matrix M; 
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iii) calculating a projection axis u for projecting the 
elements of the matrix M; 

iv) calculating the value pi = d(u,Ci) - d(u,w) and 
decomposing the set of concepts C into two subsets CI and C2 

5 as follows: 

{c. e CI if pi < 0 
c A g C2 if pi > 0 

v) storing the information {u, w, |pl|/ p2} in the node 
associated with C, where pi is the maximum of all pi < 0 and 
p2 is the minimum of all pi > 0, the data set {u, w, |pl|/ p2} 

10 constituting the navigation indicators in the concept 
dictionary. 

In a particular implementation, both the structural 
components and the complements of said structural components 
constituted by the textural components of an image of the 
15 document are analyzed, and: 

a) while analyzing the structural components of the 
image : 

al) boundary zones of the image structures are 
distributed into different classes depending on the 
20 orientation of the local variation in intensity so as to 
define structural support elements of the image; and 

a2) performing statistical analysis to construct 
terms constituted by vectors describing the local properties 
and the global properties of the structural support elements; 
25 b) while analyzing the textural components of the image: 

bl) detecting and performing parametric 

characterization of a purely random component of the image; 

b2) detecting and performing parametric 

characterization of a periodic component of the image; and 
3 0 b3) detecting and performing parametric 

characterization of a directional component of the image; 

c) grouping the set of descriptive elements of the image 
in a limited number of concepts constituted firstly by the 
terms describing the local and global properties of structural 
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support element and secondly by the parameters of the 
parametric characterizations of the random, periodic, and 
directional components defining the textural components of the 
image ; and 

5 d) for each document, defining a fingerprint from the 

occurrences, the positions, and the frequencies of said 
concepts . 

Advantageously, the local properties of the structural 
support elements taken into consideration for constructing 

10 terms comprise at least the support types selected from 
amongst a linear strip or a curved arc, the length and width 
dimensions of the support, the main direction of the support, 
and the shape and the statistical properties of the pixels 
constituting the support. 

15 The global properties of the structural support element 

taken into account for constructing terms comprise at least 
the number of each type of support and the spatial disposition 
thereof . 

Preferably, during analysis of the structural components 

2 0 of the image, a prior test is performed to detect whether at 

least one structure is present in the image, and in the 
absence of any structure, the method passes directly to the 
step of analyzing the textural components of the image. 

Advantageously, in order to decompose boundary zones of 
25 the image structures into different classes, starting from the 
digitized image defined by the set of pixels y(i,j) where 
(i,j) g I x J, where I and J designate respectively the number 
of rows and the number of columns of the image, the vertical 
gradient image g v (i/j) where e I x J and the horizontal 

3 0 gradient image g h (i/j) with (i,j) e I x J are calculated, and 

the image is partitioned depending on the local orientation of 
its gradient into a finite number of equidistant classes, the 
image containing the orientation of the gradient being defined 
by the equation: 
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Q(i,j) = arc tan 
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the classes constituting support regions likely to contain 
significant support elements are identified, and on the basis 
of the support regions, significant support elements are 
5 determined and indexed using predetermined criteria. 

In a particular aspect of the invention, the shapes of an 
image of a document are analyzed using the following steps: 

a) performing multiresolution followed by decimation of 
the image ; 

10 b) defining the image in polar logarithmic space; 

c) representing the query image or image portion by its 
Fourier transform H; 

d) characterizing the Fourier transform H as follows: 

dl) projecting H in a plurality of directions to 
15 obtain a set of vectors of dimension equal to the projection 
movement dimension; and 

d2) calculating the statistical properties of each 
projection vector; and 

e) representing the shape of the image by a term t ± 
20 constituted by values for the statistical properties of each 

projection vector. 

In a particular aspect of the invention, while indexing a 
multimedia document comprising video signals, terms t ± are 
selected that are constituted by key- images representing 

25 groups of consecutive homogeneous images, and concepts c L are 
determined by grouping together terms t ± . 

In order to determine key- images constituting terms t ± , a 
score vector SV is initially generated comprising a set of 
elements SV(i) representative of the difference or similarity 

3 0 between the content of an image of index i and the content of 
an image of index i-1, and the score vector SV is analyzed in 
order to determine key- images which correspond to maximums of 
the values of the elements SV(i) of the score vector SV. 

More particularly, an image of index j is considered as 

35 being a key-image if the value SV(j) of the corresponding 
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element of the score vector SV is a maximum and the value 
SV(j) is situated between two minimums minL and minR, and if 
the minimum Ml such that Ml = (|SV (j) - minL | , |SV (j) - minR | ) 
is greater than a given threshold, 
5 Returning to indexing a multimedia document comprising 

audio components, the document is sampled and decomposed into 
frames, which frames are subsequently grouped together into 
clips, each being characterized by a term t ± constituted by a 
parameter vector. 

10 A frame may comprise about 512 samples to about 2,048 

samples of the sampled audio document . 

Advantageously, the parameters taken into account to 
define the terms t ± comprise time information corresponding to 
at least one of the following parameters: the energy of the 

15 audio signal frames, the standard deviation of frame energies 
in the clips, the sound variation ratio, the low energy ratio, 
the rate of oscillation about a predetermined value, the high 
rate of oscillation about a predetermined value, the 
difference between the number of oscillation rates above and 

20 below the mean oscillation rate for the frames of the clips, 
the variance of the oscillation rate, the ratio of silent 
frames . 

Nevertheless, in alternative manner or in addition, the 
parameters taken into account for defining the terms t ± 

25 advantageously comprise frequency information corresponding to 
at least one of the following parameters: the center of 
gravity of the frequency spectrum of the short Fourier 
transform of the audio signal, the bandwidth of the audio 
signal, the ratio between the energy in a frequency band to 

3 0 the total energy in the entire frequency band of the sampled 
audio signal, the mean value of spectrum variation of two 
adjacent frames in a clip, the cutoff frequency of a clip. 

More particularly, the parameters taken into account for 
defining the terms t ± may comprise at least energy modulation 

35 at 4 hertz (Hz) . 
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Other characteristics and advantages of the invention 
appear from the following description of particular 
implementations, given as examples and with reference to the 
accompanying drawings, in which: 
5 • Figure 1 is a block diagram showing the process of 

producing a dictionary of concepts from a document base in 
accordance with the invention; 

• Figure 2 shows the principle on which a concept base is 
built up from terms; 

10 • Figure 3 is a block diagram showing the process of 

structuring a concept dictionary in accordance with the 
invention; 

• Figure 4 shows the structuring of a fingerprint base 
implemented in the context of the method of the invention; 

15 • Figure 5 is a flow chart showing the various steps of 

building a fingerprint base; 

• Figure 6 is a flow chart showing the various steps of 
identifying documents ; 

• Figure 7 is a flow chart showing how a first list of 
20 responses is selected; 

• Figure 8 is a flow chart showing the various steps in a 
stage of indexing documents in accordance with the method of 
the invention; 

• Figure 9 is a flow chart showing the various steps of 
25 extracting terms when processing images; 

Figure 10 is a diagram summarizing the process of 
decomposing an image that is regular and homogeneous; 

Figures 11 to 13 show three examples of images 
containing different types of elements; 
3 0 • Figures 14a to 14 f show respectively an example of an 

original image, an example of the image after processing 
taking account of the gradient modulus, and four examples of 
images processed with dismantling of the boundary zones of the 
image ; 

3 5 • Figure 15a shows a first example of an image containing 

one directional element; 
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• Figure 15al is a 3D view of the spectrum of the image 
of Figure 15a; 

• Figure 15b is a second example of an image containing 
one directional element; 

5 • Figure 15bl is an image of the Fourier modulus for the 

image of Figure 15b; 

• Figure 15c shows a third example of an image containing 
two directional elements; 

• Figure 15cl is an image of the Fourier modulus of the 
10 image of Figure 15c ; 

• Figure 16 shows the projection directions for pairs of 
integers (a, P) in the context of calculating the discrete 
Fourier transform of an image; 

• Figure 17 shows an example of the projection mechanism 
15 with the example of a pair of entries (a k , P k ) = (2, -1) ; 

Figure 18al shows an example of an image containing 
periodic components ; 

Figure 18a2 shows the image of the modulus of the 
discrete Fourier transform of the image of Figure 18al; 

2 0 • Figure 18bl shows an example of a synthetic image 

containing one periodic component ; 

Figure 18b2 is a 3D view of the discrete Fourier 
transform of the image of Figure 18bl, showing a symmetrical 
pair of peaks; 

25 • Figure 19 is a flow chart showing the various steps in 

processing an image with a vector being established that 
characterizes the spatial distribution of iconic properties of 
the image ; 

Figure 2 0 shows an example of an image being 

3 0 partitioned and of a characteristic of said image being 

created; 

Figure 21 shows a rotation through 90° of the 
partitioned image of Figure 2 0 and the creation of a vector 
characterizing this image; 
3 5 • Figure 22 shows a sound signal made up of frames being 

decomposed into clips; 
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Figure 23a shows the variation in the energy of a 
speech signal; 

• Figure 23b shows the variation in the energy of a music 
signal ; 

5 - Figure 24a shows the zero crossing rate for a speech 

signal ; 

Figure 24b shows the zero crossing rate for a music 

signal ; 

■ Figure 25a shows the center of gravity of the frequency 
10 spectrum of the short Fourier transform of a speech signal; 

• Figure 25b shows the center of gravity of the frequency 
spectrum of the short- Fourier transform of a music signal; 

• Figure 26a shows the bandwidth of a speech signal; 

• Figure 26b shows the bandwidth of a music signal; 

15 • Figure 27a shows, for three frequency sub-bands 1, 2, 

and 3 the energy ratio for each frequency sub-band over the 
total energy over the entire frequency band, for a speech 
signal ; 

• Figure 27b shows, for three frequency sub-bands 1, 2, 
20 and 3, the energy ratio for each frequency sub-band over the 

total energy over the entire frequency band, for a music 
signal ; 

• Figure 28a shows the spectral flux of a speech signal; 

• Figure 28b shows the spectral flux of a music signal; 

25 • Figure 29 is a graph showing the definition of the 

cutoff frequency of a clip; and 

Figure 3 0 shows energy modulation around 4 Hz for an 
audio signal. 

With reference to Figures 1 to 5, the description begins 
30 with the general principle of the method of indexing 
multimedia documents in accordance with the invention that 
leads to a fingerprint base being built up, each indexed 
document being associated with a fingerprint that is specific 
thereto . 

35 Starting from a multimedia document base 1, a first step 

2 consists in identifying and extracting terms t t for each 
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document, where the terms are constituted by vectors 
characterizing properties of the document to be indexed. 

By way of example, a description is given below with 
reference to Figures 22 to 3 0 of the manner in which it is 
5 possible to identify and extract terms t ± for a sound document. 

An audio document 14 0 is initially decomposed into frames 
160 that are subsequently grouped together into clips 150, 
each of which is characterized by a term constituted by a 
vector of parameters (Figure 22) . An audio document 14 0 is 
10 thus characterized by a set of terms t ± that are stored in a 
term base 3 (Figure 1) . 

Audio documents from which a characteristic vector is 
extracted can be sampled, for example, at 22,050 Hz in order 
to avoid any aliasing effect. The document is then subdivided 
15 into a set of frames, with the number of samples per frame 
being determined as a function of the type of file for 
analysis . 

For an audio document rich in frequencies and containing 
many variations, e.g. as in films, variety shows, or even 
2 0 sporting events, the number of samples in a frame should be 
small, e.g. about 512 samples. In contrast, for an audio 
document that is homogeneous, e.g. containing only speech or 
only music, this number should be large, e.g. about 2,048 
samples . 

25 An audio document clip can be characterized by various 

parameters serving to make up the terms and characterizing 
time or frequency information. 

It is possible to use all or some of the parameters that 
are mentioned below in order to form vectors of parameters 
30 constituting the terms identifying successive clips of the 
sampled audio document . 

The energy of the frames of the audio signal constitutes 
a first parameter representing time information. 

The energy of the audio signal varies a great deal in 
35 speech whereas it is rather stable in music. This thus serves 
to discriminate between speech and music, and also to detect 
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silences. Energy can be coupled with another time parameter 
such as the rate of oscillation (RO) about a value, which may- 
correspond for example to the zero crossing rate (ZCR) . A low 
RO and high energy are synonymous with voiced sound, whereas a 
5 high RO represents a non-voiced zone. 

Figure 25a shows a signal 141 showing variation of energy 
for a speech signal. 

Figure 23b shows a signal 142 that illustrates variation 
in energy for a music signal. 
10 Let N be the number of samples in, a frame, then volume or 

energy E(n) is defined by: 

E(n)=± N £sK» (2) 

M 1=0 

where S n (i) represents the value of sample i in the frame of 
index n of an audio signal. 
15 Other parameters representative of time information can 

be deduced from energy, such as, for example; 

• the standard deviation of frame energies in the clips 
(also referred to as VSTD) which constitutes a state defined 
as the variance of frame volumes in a clip normalized relative 

2 0 to the maximum frame volume of the clip; 

• the sound variation ratio (SVR) which is constituted by 
the difference between the maximum and the minimum frame 
volumes of a clip divided by the maximum volume of said 
frames; and 

25 • the low energy ratio (LER) which is the percentage of 

frames of volume lower than a threshold (e.g. set at 95% of 
the mean volume of a clip) . 

Other parameters enable the time aspect of a clip to be 
characterized, in particular the rate of oscillation about a 

3 0 predetermined value which, when said predetermined value is 

zero, defines the zero crossing rate (ZCR) . 

The ZCR may also be defined as the number of times the 
wave crosses zero. 

Z (") = \CL\Sign(S n (0||- (Sign(S n (/ - 1))|)£ ( 3 ) 
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S n (i) : value of sample i in frame n. 
N: number of samples in a frame. 
f s : sampling frequency. 

This characteristic is frequently used for distinguishing 
5 between speech and music. Sudden variations in ZCR are 
representative of alternations between voiced and non-voiced 
sound, and thus of the presence of speech. For speech, ZCR is 
low in voiced zones and very high for non- voiced zones, 
whereas for music, variations in ZCR are very small. 
10 Figure 24a shows a curve 143 representing an example of ' 

ZCR for a speech signal . 

Figure 24b shows a curve 144 representing an example of 
ZCR for a music signal. 

Another parameter characterizing the time aspect of a 
15 clip may be constituted by a high rate of oscillation about a 
predetermined value which, when said predetermined value is 
zero, defines a high zero crossing rate (HZCR) . 

HZCR may be defined as being the ratio of the number of 
frames for which the ZCR has a value a, e.g. 1.5 greater than 
20 the mean ZCR of the clip (Is) : 

i N-l 

HZCR = — V [sgn(ZCR(n) - 1 .5avZCR) + 1] ( 4 ) 

such that : 

avZCR = — Y ZCR(n) ( 5 ) 

with: 

25 n: frame index; 

N: number of frames in a clip. 

For speech segments, clips are of 0 to 200 seconds (s) 
with an HZCR of around 0.15. 

In contrast, for music segments, clips are of 200 s to 
30 350 s and the HZCR lies around 0.05 and is generally almost 
zero. 

For environmental sound, the segments corresponding to 
the clips are 351 s to 450 s. 
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HZCR is low for white noise and large for a deafening 
sound (e.g. a drum) . 

It is also possible to define a parameter ZCRD which is 
constituted by the difference between the ZCR number above and 
5 below the mean ZCR for the frames of a clip, and the parameter 
ZCRV which is constituted by the variance of the ZCR. 

Another parameter characterizing the time aspect of a 
clip is the silent frame ratio (SFR) which is the percentage 
of non- silent frames in a clip. 
10 A frame is said to be non- silent if its volume exceeds a 

certain threshold (10) and if the value of its ZCR is below a 
threshold ZCR. 

Thus, the ratio of non- silent frames in a clip serves to 
detect silence. 

15 Other statistical properties of ZCR can be used as 

characteristic parameters such as: 

i) the third order moment of the mean; and 

ii) the number of ZCRs exceeding a certain threshold. 

The parameters taken into account for defining the terms 
20 t ± may also comprise frequency information taking account of a 
fast Fourier transform (FFT) calculated for the audio signal. 

Thus, a parameter known as the spectral centroid (SC) may 
be defined as being the center of gravity of the frequency 
spectrum of the short Fourier transform (SFT) of the audio 
25 signal: 

N-1 

Z iS n(i) 

SC(n) = ^i (6) 

i=0 

such that S n (i) : spectral power of frame No. i of clip No. n. 

The parameter SC is high for music, since the high 
frequencies are spread over a wider zone than for speech (in 
3 0 general 6 octaves for music and only 3 for speech) . This is 
associated with the sensation of brightness for the sound 
heard. This is an important perceptible attribute for 

characterizing tone color or "timbre". 
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Figure 2 5a shows a curve 14 5 representing an example of 
SC for a speech signal. 

Figure 2 5b shows a curve 14 6 representing an example of 
SC for a music signal. 
5 Another parameter is constituted by the bandwidth BW 

which can be calculated from the variance in the preceding 
parameter SC(n). 

Z(i-SC(n)) 2 s„(i) 

BW 2 (n)=^ ijzj (7) 

2X(i) 

i=0 

The bandwidth BW is important both in music and in 
10 speech. 

Figure 26a shows a curve 14 7 presenting an example of the 
bandwidth of a speech signal. 

Figure 2 6b shows a curve 14 8 presenting an example of the 
bandwidth of a music signal. 
15 Another useful parameter is constituted by the ratio SBER 

between the energy in a sub-band of frequency i and the total 
energy in the entire frequency band of the sampled audio 
signal. 

With consideration to the perceptual properties of the 
20 human ear, the frequency band is decomposed into four sub- 
bands that correspond to the filters of the cochlea. When the 
sampling frequency is 22,025 Hz, the frequency bands are: 0- 
630 Hz, 630 Hz-1,720 Hz, 1,720 Hz-4,400 Hz, and 4,400 Hz- 
11,025 Hz. For each of these bands, its SBERi energy is 
25 calculated corresponding to the ratio of the energy in the 
band over the energy over the entire frequency band. 

Figure 27a shows three curves 151, 152, and 153 
representing for three frequency sub-bands 1, 2, and 3, the 
energy ratio in each frequency sub-band over the total energy 
3 0 of the entire frequency band, for an example of a speech 
signal . 

Figure 27b shows three curves 154, 155, and 156 showing 
for three frequency sub-bands 1, 2, and 3, the energy ratio in 
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each frequency sub-band over the total energy in the entire 
frequency band, for an example of a music signal. 

Another parameter is constituted by the spectral flux SF 
which is defined as the mean value of spectral variation 
5 between two adjacent frames in a clip: 

SF ( n ) = T7 Z N( S n (0 + ») ~ lo S( S n (i - 1) + S)] 2 ( 8 ) 

N w 

where : 

5: a constant of small value; 

S n (i) : spectral power of frame No. i of clip No. n. 
10 The spectral flux of speech is generally greater than 

that of music, and the spectral flux of environmental sound is 
greater still. It varies considerably compared with the other 
two signals. 

Figure 28a shows a curve 157 representing the spectral 
15 flux of an example of a speech signal. 

Figure 28b shows a curve 158 representing the spectral 
flux of an example of a music signal. 

Another useful parameter is constituted by the cutoff 
frequency of a clip (CCF) . 
20 Figure 29 shows a curve 149 illustrating the amplitude 

spectrum as a function of frequency fe, and the cutoff 
frequency fc is the frequency beneath which 95% of the 
spectral energy (spectral power) is concentrated. 

In order to determine the cutoff frequency of a clip, the 
25 clip Fourier transform DS(n) is calculated. 

DS(«) = 5X(,) (9) 

The cutoff frequency fc is determined by: 

f]S^(i)>0.95xDS (10) 

i=0 

and 

30 £S;;(i)<0.95xDS (11) 

i=0 
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The CCF is higher for a non- voiced sound (richer in high 
frequencies) than for a voiced sound (presence of speech in 
which power is concentrated in the low frequencies) . 

This measurement makes it possible to characterize 
5 changes between voiced and non-voiced periods in speech since 
this value is low for clips containing music only. 

Other parameters can also be taken into account for 
defining the terms t ± of an audio document, such as energy 
modulation around 4 Hz which constitutes a parameter coming 
10 simultaneously from frequency analysis and from time analysis. 

The 4 Hz energy modulation (4 EM) is calculated from the 
volume contour using the following formula: 



ScEwc^o+ixT))/^) 

4EM = i= ° J= ° (12) 

Z S n(0 

i=0 

where : 

15 Sn^i) : spectral power of frame No. i of clip No. n; 

W(j): triangular window centered on 4 Hz; 
T: width of a clip. 

Speech has a 4 EM that is greater than music since, for 
speech, syllable changes take place at around 4 Hz. 

2 0 A syllable is a combination of a zone of low energy 

(consonant) and a zone of high energy (vowel) . 

Figure 3 0 shows a curve 161 representing an example of an 
audio signal and a curve 162 showing for said signal the 
energy modulation around 4 Hz. 

25 Multimedia documents including audio components are 

described above. 

When indexing multimedia documents having video signals, 
it is possible to select terms t ± that are constituted by key- 
images representing groups of consecutive homogeneous images. 

30 The terms t ± can in turn represent for example: dominant 

colors, textural properties, or the structures of dominant 
zones in the key- images of the video document. 
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In general, with images as described in greater detail 
below, the terms can represent dominant colors, textural 
properties, or the structures of dominant zones in an image. 
Several methods can be implemented in alternation or 
5 cumulatively, either over an entire image or over portions of 
the image, in order to determine the terms t ± before 
characterizing the image. 

For a document containing text, the terms t ± may be 
constituted by words of the spoken or written language, by 

10 numbers, or by other identifiers constituted by combinations 
of characters (e.g. combinations of letters and digits). 

Consideration is given again to indexing a multimedia 
document comprising video signals, in which terms t ± are 
selected that are constituted by key- images representing 

15 groups of consecutive homogeneous images, and concepts c ± are 
determined by grouping together terms t ± . 

Detecting key- images relies on the way images in a video 
document are grouped together in groups each of which contains 
only homogeneous images . From each of these groups one or 

2 0 more images (referred to as key- images) are extracted that are 
representative of the video document. 

The grouping together of video document images relies on 
producing a score vector SV representing the content of the 
video, characterizing variation in consecutive images of the 

25 video (the elements SV ± represent the difference between the 
content of the image of index i and the image of index i-1) , 
with SV being equal to zero when the contents in^ and irn^ are 
identical, and it is large when the difference between the two 
contents is large. 

30 In order to calculate the signal SV, the red, green, and 

blue (RGB) bands of each image in^ of index i in the video are 
added together to constitute a single image referred to as 
TRi . Thereafter the image TRi is decomposed into a plurality 
of frequency bands so as to retain only the low frequency 

35 component LTRi . To do this, two mirror filters (a low pass 
filter LP and a high pass filter HP) are used which are 
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applied in succession to the rows and to the columns of the 
image. Two types of filter are considered: a Haar wavelet 
filter and the filter having the following algorithm: 



5 Row scanning 

From TRk the low image is produced 

For each point a 2xi ^ of the image TR, do 

Calculate the point b ± #j of the low frequency low image, 
b ± j takes the mean value of a 2xi f3-1 , a 2xi,j' and a 2xi,j+i- 

10 

Column scan 

From two low images, the image LTRk is produced 
For each point b ± 2xj of the image TR, do 

Calculate the point hb ± ^ of the low frequency low image, 
15 bbi j takes the mean value of b i#2 xj-i' b i,2xj' and b i,2xj+i- 

The row and column scans are applied as often as desired. 
The number of iterations depends on the resolution of the 
video images. For images having a size of 512 x 512, n can be 
set at three. 

2 0 The result image LTRi is projected in a plurality of 

directions to obtain a set of vectors Vk, where k is the 
projection angle (element j of V0 , the vector obtained 
following horizontal projection of the image, is equal to the 
sum of all of the points of row j in the image) . The 
25 direction vectors of the image LTRi are compared with the 
direction vectors of the image LTRi - 1 to obtain a score i 
which measures the similarity between the two images. This 
score is obtained by averaging all of the vector distances 
having the same direction: for each k, the distance is 

3 0 calculated between the vector Vk of image i and the vector Vk 

of image i-1, and then all of these distances are calculated. 

The set of all the scores constitutes the score vector 
SV: element i of SV measures the similarity between the image 
LTRi and the image LTRi - 1 . The vector SV is smoothed in order 
35 to eliminate irregularities due to the noise generated by 
manipulating the video. 
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There follows a description of an example of grouping 
images together and extracting key- images. 

The vector SV is analyzed in order to determine the key- 
images that correspond to the maxima of the values of SV. An 
5 image of index j is considered as being a key- image if the 
value SV(j) is a maximum and if SV(j) is situated between two 
minimums minL (left minimum) and minR (right minimum) and if 
the minimum Ml where : 

Ml = min( | SV(Cj ) -minG| , | SV ( j ) -minR | ) 

10 is greater than a given threshold. 

In order to detect key- images, minL is initialized with 
SV(0) and then the vector SV is scrolled through from left to 
right. At each step, the index j corresponding to the maximum 
value situated between two minimums (minL and minR) is 

15 determined, and then as a function of the result of the 
equation defining Ml it is decided whether or not to consider 
j as being an index for a key- image. It is possible to take a 
group of several adjacent key- images, e.g. key- images having 
indices j-1, j, and j+1. 

20 Three situations arise if the minimum of the two slopes, 

defined by the two minimums (minL and minR) and the maximum 
value, is not greater than the threshold: 

i) if |SV(j) = minL | is less than the threshold and minL 
does not correspond to SV(0) , then the maximum SV(j) is 

25 ignored and minR becomes minL; 

ii) if |SV(j) - minL | is greater than the threshold and 
if |SV(j) - minR | is less than the threshold, then minR and 
the maximum SV(j) are retained and minL is ignored unless the 
closest maximum to the right of minR is greater than a 

3 0 threshold. Under such circumstances, minR is also retained 
and j is declared as being an index of a key- image. When minR 
is ignored, minR takes the value closest to the minimum 
situated to the right of minR; and 

iii) if both slopes are less than the threshold, minL is 
3 5 retained and minR and j are ignored. 
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After selecting a key- image, the process is iterated. At 
each iteration, minR becomes minL. 

With reference again to Figure 1; starting from a term 
base 3 having P terms, the terms t ± are processed in a step 4 
5 and grouped together into concepts c ± (Figure 2) for storing in 
a concept dictionary 5. The idea at this point is to generate 
a step of signatures characterizing a class of documents. The 
signatures are descriptors which, e.g. for an image, represent 
color, shape, and texture. A document can then be 

10 characterized and represented by the concepts of the 
dictionary. 

A fingerprint of a document can then be formed by the 

signature vectors of each concept of the dictionary 5 . The 

signature vector is constituted by the documents where the 
15 concept Ci is present and by the positions and the weight of 

said concept in the document . 

The terms t ± extracted from a document base 1 are stored 

in a term base 3 and processed in a module 4 for extracting 

concepts c ± which are themselves grouped together in a concept 
20 dictionary 5. Figure 2 shows the process of constructing a 

concept base c ± (1 < i < m) from terms tj (1 < j < n) 

presenting similarly scores wij . 

The module for producing the concept dictionary receives 

as input the set P of terms from the base 3 and the maximum 
25 desired number N concepts is set by the user. Each concept c ± 

is intended to group together terms that are neighbors from 

the point of view of their characteristics. 

In order to produce the concept dictionary, the first 

step is to calculate the distance matrix T between the terms 
30 of the base 3, with this matrix being used to create a 

partition of cardinal number equal to the desired number N of 

concepts . 

The concept dictionary is set up in two stages: 

• decomposing P into N portions P = P x P 2 ... kj P n ; 
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optimizing the partition that decomposes P into M 
classes P = C x u C 2 ... u C M with M less than or equal to P. 

The purpose of the optimization process is to reduce the 
error in the decomposition of P into N portions {P x , P 2 , P N } 
5 where each portion P ± is represented by the term t± which is 
taken as being a concept, with the error that is then 
committed being equal to the following expression: 

is the error committed when replacing the terms tj of P ± by t ± . 
10 It is possible to decompose P into N portions in such a 

manner as to distribute the terms so that the terms that are 

furthest apart lie in distinct portions while terms that are 

closer together lie in the same portions. 

Step 1 of decomposing the set of terms P into two 
15 portions P x and P 2 is described initially: 

a) the two terms t ± and tj in P that are farthest apart 
are determined, this corresponding to the greatest distance 

of the matrix T; 

b) for each t k of P, t k is allocated to P x if the distance 
20 D ki is smaller than the distance D kj , otherwise it is allocated 

to P 2 . 

Step 1 is iterated until the desired number of portions 
has been obtained, and on each iteration steps a) and b) are 
applied to the terms of set P x and set P 2 . 
25 The optimization stage is as follows. 

The starting point of the optimization process is the N 
disjoint portions of P {P^ P 2 , P N } and the N terms {t lt t 2 , 

t N } representing them, and it is used for the purpose of 
reducing the error in decomposing P into {P lt P 2 , P N } 
3 0 portions. 

The process begins by calculating the centers of gravity 
c ± of the Pi. Thereafter the error ec ± = ^d 2 ^,^) is 

calculated that is compared with ec ± , and t ± is replaced by c ± 
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if ec ± is less than st t . Then after calculating the new matrix 
T and if convergence is not reached, decomposition is 
performed. The stop condition is defined by: 

(ec t -ec t ^ 1 ) < threshold 
ec t 

5 which is about 10~ 3 , ec t being the error committed at the 
instant t that represents the iteration. 

There follows a matrix T of distances between the terms, 
where D ±j designates the distance between term t ± and term tj . 
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For multimedia documents having a variety of contents, 
Figure 3 shows an example of how the concept dictionary 5 is 
structured. 

In order to facilitate navigation inside the dictionary 5 
15 and determine quickly during an identification stage the 
concept that is closest to a given term, the dictionary 5 is 
analyzed and a navigation chart 9 inside the dictionary is 
established. 

The navigation chart 9 is produced iteratively. On each 
20 iteration, the set of concepts is initially split into two 
subsets, and then on each iteration, one of the subsets is 
selected until the desired number of groups is obtained or 
until the stop criterion is satisfied. The stop criterion may 
be, for example, that the resulting subsets are all 
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homogeneous with a small standard deviation, for example. The 
final result is a binary tree in which the leaves contain the 
concepts of the dictionary and the nodes of the tree contain 
the information necessary for traversing the tree during the 
5 stage of identifying a document. 

There follows a description of an example of the module 6 
for distributing a set of concepts. 

The set of concepts C is represented in the form of a 

matrix M = [c,,c 2 ,...,c^]e , where c,eSR*, where c ± represents a 

10 concept having p values. Various methods can be used for 
obtaining an axial distribution. The first step is to 
calculate the center of gravity C and the axis used for 
decomposing the set into two subsets. 

The processing steps are as follows: 

15 Step 1: calculating a representative of the matrix M such 

as the centroid w of matrix M: 

w = TfZ c > (13) 

Step 2 : calculating the covariance matrix M between the 
elements of the matrix M and the representative of the matrix 
20 M, giving in the above special case 

M = M - we, where e = [1 , 1 , 1 , 1] (14) 
Step 3: calculate an axis for projecting the elements of 
the matrix M, e.g. the eigenvector U associated with the 
greatest eigenvalue of the covariance matrix. 
25 Step 4: calculate the value pi = u T (c ± - w) and decompose 

the set of concepts C into two substeps CI and C2 as follows: 

{c, eCl if pi<0 
>JS . (15) 
c. eC2 if pi>0 

The data set stored in the node associated with C is {u, 
w, | pi |f p2} where pi is the maximum of all pi < 0 and p2 is 
30 the minimum of all pi > 0 . 

The data set {u, w, |pl|/ p2} constitutes the navigation 
indicators in the concept dictionary. Thus, during the 
identification stage for example, in order to determine the 
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concept that is closest to a term t ±/ the value pti = u T (t ± - 
w) is calculated and then the node associated with CI is 
selected if | ( | pti | - | pi | ) | < |(|pti| - p2) | , else the node C2 
is selected. The process is iterated until one of the leaves 
5 of the tree has been reached. 

A singularity detector module 8 may be associated with 
the concept distribution module 6. 

The singularity detector serves to select the set Ci that 
is to be decomposed. One of the possible methods consists in 
10 selecting the less compact set. 

Figures 4 and 5 show the indexing of a document or a 
document base and the construction of a fingerprint base 10. 

The fingerprint base 10 is constituted by the set of 
concepts representing the terms of the documents to be 
15 protected. Each concept Ci of the fingerprint base 10 is 
associated with a fingerprint 11, 12, 13 constituted by a data 
set such as the number of terms in the documents where the 
concept is present, and for each of these documents, a 
fingerprint 11a, lib, 11c is registered comprising the index 

2 0 of the document pointing to the address of the document, the 

number of terms, the number of occurrences of the concept 
(frequency) , the score, and the concepts that are adjacent 
thereto in the document. The score is a mean value of 
similarity measurements between the concept and the terms of 
25 the document which are closest to the concept. The index of a 
given document which points to the address of said document is 
stored in a database 14 containing the addresses of protected 
documents . 

The process 20 for generating fingerprints or signatures 

3 0 of the documents to be indexed is shown in Figure 5. 

When a document is registered, the pertinent terms are 
extracted from the document (step 21) , and the concept 
dictionary is taken into account (step 22) . Each of the terms 
t ± of the document is projected into the space of the concepts 
3 5 dictionary in order to determine the concept c ± that represents 
the term t ± (step 23) . 
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Thereafter the fingerprint of concept c L is updated (step 
24) . This updating is performed depending on whether or not 
the concept has already been encountered, i.e. whether it is 
present in the documents that have already been registered. 
5 If the concept c ± is not yet present in the database, then 

a new entry is created in the database (an entry in the 
database corresponds to an object made up of elements which 
are themselves objects containing the signature of the concept 
in those documents where the concept is present) . The newly 

10 created event is initialized with the signature of the 
concept. The signature of a concept in a document is made up 
mainly of the following data items: document address, number 
of terms, frequency, adjacent concepts, and score. 

If the concept c ± exists in the database, then the entry 

15 associated with the concept has added thereto its signature in 
the query document, which signature is made up of (document 
address, number of terms, frequency, adjacent concepts, and 
score) . 

Once the fingerprint base has been constructed (step 25) , 
20 the fingerprint base is registered (step 26) . 

Figure 6 shows a process of identifying a document that 
is implemented on an on-line search platform 30. 

The purpose of identifying a document is to determine 
whether a document presented as a query constitutes 
25 reutilization of a document in the database. It is based on 
measuring the similarity between documents. The purpose is to 
identify documents containing protected elements. Copying can 
be total or partial. When partial, the copied element will 
have been subjected to modifications such as: eliminating 
3 0 sentences from a text, eliminating a pattern from an image, 
eliminating a shot or a sequence from a video document, 
changing the order of terms, or substituting terms with other 
terms in a text. 

After presenting a document to be identified (step 31) , 
3 5 the terms are extracted from that document (step 32) . 
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In association with the fingerprint base (step 25) , the 
concepts calculated from the terms extracted from the query 
are put into correspondence with the concepts of the database 
(step 33) in order to draw up a list of documents having 
5 contents similar to the content of the query document. 

The process of establishing the list is as follows: 
p dj designates the degree of resemblance between document 
dj and the query document, with 1 < j < N, where N is the 
number of documents in the reference database. 
10 All p dj are initialized to zero. 

For each term t ± in the query provided in step 3 31 
(Figure 7) , the concept Ci that represents it is determined 
(step 332) . 

For each document dj where the concept is present, its p dj 
15 is updated as follows: 

Pdj = Pdj + f (frequency, score) 

where several functions f can be used, e.g.: 

f (frequency, score) = frequency x score 

where frequency designates the number of occurrences of 

2 0 concept Ci in document dj and where score designates the mean 

of the resemblance scores of the terms of document dj with 
concept Cj . 

The p dj are ordered, and those that are greater than a 
given threshold (step 333) are retained. Then the responses 
25 are confirmed and validated (step 34) . 

Response confirmation: the list of responses is filtered 
in order to retain only the responses that are the most 
pertinent. The filtering used is based on the correlation 
between the terms of the query and each of the responses. 

3 0 Validation: this serves to retain only those responses 

where it is very certain that content has been reproduced. 
During this step, responses are filtered, taking account of 
algebraic and topological properties of the concepts within a 
document: it is required that neighborhood in the query 
35 document is matched in the response documents, i.e. two 



-28- 



concepts that are neighbors in the query document must also be 
neighbors in the response document. 

The list of response documents is delivered (step 35) . 
Consideration is given below in greater detail to 
5 multimedia documents that contain images. 

The description bears in particular on building up the 
fingerprint base that is to be used as a tool for identifying 
a document, based on using methods that are fast and effective 
for identifying images and that take account of all of the 

10 pertinent information contained in the images going from 
characterizing the structures of objects that make them up, to 
characterizing textured zones and background color . The 
objects of the image are identified by producing a table 
summarizing various statistics made on information about 

15 object boundary zones and information on the neighborhoods of 
said boundary zones. Textured zones can be characterized 
using a description of the texture that is very fine, both 
spatially and spectrally, based on three fundamental 
characteristics, namely its periodicity, its overall 

2 0 orientation, and the random appearance of its pattern. 
Texture is handled herein as a two-dimensional random process. 
Color characterization is an important feature of the method. 
It can be used as a first sort to find responses that are 
similar based on color, or as a final decision made to refine 

25 the search. 

In the initial stage of building up fingerprints, account 
is taken of information classified in the form of components 
belonging to two major categories: 

• so-called "structural" components that describe how the 

30 eye perceives an object that may be isolated or a set of 
objects placed in an arrangement in three dimensions (images 
81 and 82 of Figures 11 and 12) ; and 

so-called "textural" components that complement 
structural components and represent the regularity or 

35 uniformity of texture patterns (images 82 and 83 of Figures 12 
and 13) . 
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Figure 11 thus shows an image 81 containing structural 
elements that do not present any texture patterns. 

Figure 12 shows an image 81 containing structural 
elements and a textured background. 
5 Figure 13 shows an image 83 having no structural elements 

but that is entirely textured. 

As mentioned above, during the stage of building 
fingerprints, each document in the document base is analyzed 
so as to extract pertinent information therefrom. This 
10 information is then indexed and analyzed. The analysis is 
performed by a string of procedures that can be summarized as 
three steps : 

for each document, extracting predefined 

characteristics and storing this information in a "term" 
15 vector; 

• grouping together in a concept all of the terms that 
are "neighboring" from the point of view of their 
characteristics, thus enabling searching to be made more 
concise; and 

20 • building a fingerprint that characterizes the document 

using a small number of entities. Each document is thus 
associated with a fingerprint that is specific thereto. 

Figure 8 shows the indexing of an image document 52 
contained in a previously registered image base 51 in order to 
25 characterize the image 52 by a finite number of parameters 
that can subsequently be stored and manipulated easily. In 
step 53, terms are extracted from the document to be searched 
and they are stored in a buffer memory (step 54) . 

In step 55, projection is performed in the term space of 
3 0 the reference base. 

In step 56, a vectorial description is obtained giving 
pertinence values to the terms in the document to be searched. 

Step 57 consists in distributing the terms in N groups 58 
of concepts . 

35 Step 59 consists in projecting each group 58 into concept 

space in order to obtain N partitions 62. 
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Finally, an orthogonal projection 62 leads to N sets 63 
of reduced vectorial descriptions (RVD) . 

In a subsequent search stage, following a request made by 
a user, e.g. to identify a query image, a search is made for 
5 all multimedia documents that are similar or that comply with 
the request. To do this, as mentioned above, the terms of the 
query document are calculated and they are compared with the 
concepts of the databases in order to deduce which document (s) 
of the database is/are similar to the query document. 
10 The stage of constructing the terms of an image is 

described in greater detail below. 

The stage of constructing the terms of an image usefully 
implements characterization of the structural supports of the 
image. Structural supports are elements making up a scene of 
15 the image. The most significant are those that define the 
objects of the scene since they characterize the various 
shapes that are perceived when any image is observed. 

This step concerns extracting structural supports. It 
consists in dismantling boundary zones of image objects, where 

2 0 boundaries are characterized by locations in which high levels 

of intensity variation are observed . between two zones. This 
dismantling operates by a method that consists in distributing 
the boundary zones amongst a plurality of "classes" depending 
on the local orientation of the image gradient (the 
25 orientation of the variation in local intensity) . This 
produces a multitude of small elements referred to as 
structural support elements (SSE) . Each SSE belongs to an 
outline of a scene and is characterized by similarity in terms 
of the local orientation of its gradient. This is a first 

3 0 step that seeks to index all of the structural support 

elements of the image. 

The following process is then performed on the basis of 
these SSEs, i.e. terms are constructed that describe the local 
and global properties of the SSEs. 
35 The information extracted from each support is considered 

as constituting a local property. Two types of support can be 
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distinguished: straight rectilinear elements (SRE) , and curved 
arcuate elements (CAE) . 

The straight rectilinear elements SRE are characterized 
by the following local properties: 
5 • dimension (length, width) ; 

• main direction (slope) ; 

• statistical properties of the pixels constituting the 
support (mean energy value, moments) ; and 

• neighborhood information (local Fourier transform) . 

10 The curved arcuate elements CAE are characterized in the 

same manner as above, together with the curvature of the arcs. 

Global properties cover statistics such as the numbers of 
supports of each type and their dispositions in space 
(geometrical associations between supports: connexities, left, 
15 right, middle, ...) . 

To sum up, for a given image, the pertinent information 
extracted from the objects making up the image is summarized 
in Table 1 . 
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Structural supports of 
objects of an image 


Type 


SSE 


SRE 


CAE 


Global 
properties 


Total number 


n 


ni 


n 2 


Number long 
(> threshold) 


nl 


nxl 


n 2 l 


Number short 
(< threshold) 


nc 


nic 


n 2 c 


Number of long 
supports at a 
left or right 
connection 




nilgdx 


n 2 lgdx 


Number of middle 
connection 


- 


nilgdx 


n 2 lgdx 


Number of 
parallel long 
supports 


- 


nipll 


n 2 pll 


Local 

properties 


Luminance 
(> threshold) 








Luminance 
(< threshold) 


- 






Slope 








Curvature 








Characterization 
of the 
neighborhood of 
the supports 









Table 1 



The stage of constructing the terms of an image also 
5 implements characterizing pertinent textual information of the 
image. The information coming from the texture of the image 
is subdivided by three visual appearances of the image: 

• random appearance (such as an image of fine sand or 
grass) where no particular arrangement can be determined; 
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• periodic appearance (such as a patterned knit) or a 
repetition of dominant patterns (pixels or groups of pixels) 
is observed; and finally 

a directional appearance where the patterns tend 
overall to be oriented in one or more privileged directions. 

This information is obtained by approximating the image 
using parametric representations or models. Each appearance 
is taken into account by means of the spatial and spectral 
representations making up the pertinent information for this 
portion of the image. Periodicity and orientation are 

characterized by spectral supports while the random appearance 
is represented by estimating parameters for a two-dimensional 
autoregressive model . 

Once all of the pertinent information has been extracted, 
it is possible to proceed with structuring texture terms. 
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Spectral supports and autoregressive 
parameters of the texture of an image 




Periodic component 


Total number of 
periodic elements 


np 




Frequencies 


Pair (co p/ v p ) , 
0 < p < np 




Amplitudes 


Pair (C p , Dp) , 
0 < p < np 


Directional 
component 


Total number of 

directional 

elements 


nd 




Orientations 


Pair (oti, Pi) , 
0 < p < np 




Frequencies 


v ±/ 0 < i < nd 


Random components 


Noise standard 
deviation 


a 




Autoregressive 
parameters 


Kol' (iij)eS M 



Table 2 



Finally, the stage of constructing the terms of an image 
5 can also implement characterizing the color of the image. 

Color is often represented by color histograms, which are 
invariant in rotation and robust against occlusion and changes 
in camera viewpoint. 

Color quantification can be performed in the red, green, 
10 blue (RGB) space, the hue, saturation, value (HSV) space, or 
the LUV space, but the method of indexing by color histograms 
has shown its limitations since it gives global information 
about an image, so that during indexing it is possible to find 
images that have the same color histogram but that are 
15 completely different. 
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Numerous authors propose color histograms that integrate 
spatial information. For example this can consist in 

distinguishing between pixels that are coherent and pixels 
that are incoherent, where a pixel is coherent if it belongs 
5 to a relatively large region of identical pixels, and is 
incoherent if it forms part of a region of small size. 

A method of characterizing the spatial distribution of 
the constituents of an image (e.g. its color) is described 
below that is less expensive in terms of computation time than 
10 the above-mentioned methods, and that is robust faced with 
rotations and/or shifts. 

The various characteristics extracted from the structural 
support elements together with the parameters of the periodic, 
directional, and random components of the texture field, and 
15 also the parameters of the spatial distribution of the 
constituents of the image constitute the "terms" that can be 
used for describing the content of a document . These terms 
are grouped together to constitute "concepts" in order to 
reduce the amount of "useful information" of a document. 

2 0 The occurrences of these concepts and their positions and 

frequencies constitute the "fingerprint" of a document. These 
fingerprints then act as links between a query document and 
documents in a database while searching for a document. 

An image does not necessarily contain all of the 
25 characteristic elements described above. Consequently, 
identifying an image begins with detecting the presence of its 
constituent elements . 

Figure 9 shows an example of a flow chart for a process 
of extracting terms from an image, the process having a first 

3 0 step 71 of characterizing image objects in terms of structural 

supports, which, where appropriate, may be preceded by a test 
for detecting structural elements, which test serves to omit 
the step 71 if there are no structural elements. 

Step 72 is a test for determining whether there exists a 
35 textured background. If so, the process moves on to step 73 
of characterizing the textured background in terms of spectral 
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supports and autoregressive parameters AR, followed by a step 
74 of characterizing the background color. 

If there is no structured background, then the process 
moves directly from step 72 to step 74. 
5 Finally, a step 75 lies in storing terms and building up 

fingerprints . 

The description returns in greater detail to 
characterizing the structural support elements of an image. 

The principle on which this characterization is based 

10 consists in dismantling boundary zones of image objects into 
multitudes of small base elements referred to as significant 
support elements (SSEs) conveying useful information about 
boundary zones that are made up of linear strips of varying 
size, or of bends having different curvatures. Statistics 

15 about these objects are then analyzed and used for building up 
the terms of these structural supports. 

In order to describe more rigorously the main methods 
involved in this approach, a digitized image is written as 
being the set {y(i,j), (i/j) e I x J} , where I and J are 

2 0 respectively the number of rows and the number of columns in 
the image . 

On the basis of previously calculated vertical gradient 
images {g v (i,j), (i,j) e I x j} and horizontal gradient images 
{g h (i,j), (i,j) e I x j} , this approach consists in 
25 partitioning the image depending on the local orientation of 
its gradient into a finite number of equidistant classes. The 
image containing the orientation of the gradient is defined by 
the following formula: 

r gh(i/jr 



o(i,j)= arc tan 



(1) 



.gv(io)J 

3 0 A partition is no more than an angular decomposition in 

the two-dimensional (2D) plane (from 0° to 360°) using a well- 
defined quantization pitch. By using the local orientation of 
the gradient as a criterion for decomposing boundary zones, it 
is possible to obtain a better grouping of pixels that form 
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parts of the same boundary zone . In order to solve the 
problem of boundary points that are shared between two 
juxtaposed classes, a second partitioning is used, using the 
same number of classes as before, but offset by half a class. 
5 On the basis of these classes coming from the two 
partitionings, a simple procedure consists in selecting those 
that have the greatest number of pixels . Each pixel belongs 
to two classes, each coming from a respective one of the two 
partitionings. Given that each pixel is potentially an 

10 element of an SSE, if any, the procedure opts for the class 
that contains the greater number of pixels amongst those two 
classes. This constitutes a region where the probability of 
finding an SSE of larger size is the greatest possible. At 
the end of this procedure, only those classes that contain 

15 more than 50% of the candidates are retained. These are 
regions of the support that are liable to contain SSEs . 

From these support regions, SSEs are determined and 
indexed using certain criteria such as the following: 

length (for this purpose a threshold length 1 0 is 

2 0 determined and SSEs that are shorter and longer than the 
threshold are counted) ; 

• intensity, defined as the mean of the modulus of the 
gradient of the pixels making up each SSE (a threshold written 
I 0 is then defined, and SSEs that are below or above the 

2 5 threshold are indexed) ; and 

- contrast, defined as the difference between the pixel 
maximum and the pixel minimum. 

At this step in the method, all of the so-called 
structural elements are known and indexed in compliance with 

30 pre-identif ied types of structural support. They can be 
extracted from the original image in order to leave room for 
characterizing the texture field. 

By way of example, consider image 81 in Figure 11, 
reproduced as image 101 in Figure 14a, having boundary zones 

35 that are shown in image 102 of Figure 14b. The elements of 
these boundary zones are then dismantled and, depending on the 
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orientation of their gradients, they are distributed amongst 
the various classes represented by images 103 to 106 of 
Figures 14c to 14f. These various elements constitute the 
significant support elements, and a statistical analysis 
5 thereof serves to build up the terms of the structural 
component . 

In Figures 14c to 14 f, by way of example, image 103 
corresponds to a class 0 (0° - 45°), image 104 corresponds to 
a class 1 (45° - 90°) , image 105 corresponds to a class 2 (90° 
10 - 135°) , and image 106 corresponds to a class 3 (135° - 180°) . 

In the absence of structural elements, it is assumed that 
the image is textured with patterns that are regular to a 
greater or lesser extent, and the texture field is then 
characterized. For this purpose, it is possible to decompose 
15 the image into three components as follows: 

a textural component containing anarchic or random 
information (such as an image of fine sand or grass) in which 
no particular arrangement can be determined; 

a periodic component (such as a patterned knit) in 
2 0 which repeating dominant patterns are observed; and finally 

a directional component in which the patterns tend 
overall towards one or more privileged directions. 

Since the idea is to characterize accurately the texture 
of the image on the basis of a set of parameters, these three 
25 components are represented by parametric models. 

Thus, the texture of the regular and homogeneous image 15 
written {y(i#j># (i/j) e I x j} is decomposed into three 
components 16, 17, and 18 as shown in Figure 10, using the 
following relationship : 
30 {y(i , j)} = {w(i , j)}+ {h(i , j)}+ {e(i , j)}. ( 16 ) 

Where {w(i,j)} is the purely random component 16, {h- 
(i,j)} is the harmonic component 17, and {e(i,j)} is the 
directional component 18. This step of extracting information 
from a document is terminated by estimating parameters for 
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these three components 16 , 17, and 18. Methods of making such 
estimates are described in the following paragraphs. 

The description begins with an example of a method for 
detecting and characterizing the directional component of the 
5 image . 

Initially it consists in applying a parametric model to 
the directional component {e(i,j)}. It is constituted by a 
denumerable sum of directional elements in which each is 
associated with a pair of integers (a, P) defining an 
10 orientation of angle G such that 0 = tan^p/a. In other words, 
e(i,j) is defined by: 
e (i'j)= Ze (a/3) (i,j) 

(a,p>=0 

in which each e (a p } (i,j) is defined by: 

a + P (17) 

+ t^(i«"jP) x sin(2n^- T (ip + ja)) ] 

a +p 

15 where: 

Ne is the number of directional elements associated 
with (a, P) ; 

• v k is the frequency of the k th element; and 

• {s k (ia - jp) } and {t k (ict - jp) } are the amplitudes. 

20 The directional component {e(i,j)} is thus completely 

defined by knowing the parameters contained in the following 
vector E: 

E = W'3i4 lk 's lk (4t lk ( c )fc = J( aj .pJ e0 < 18 > 

In order to estimate these parameters, use is made of the 
25 fact that the directional component of an image is represented 
in the spectral domain by a set of straight lines of slopes 
orthogonal to those defined by the pairs of integers (a x , p a ) 
of the model which are written (a 1# Pi) 1 . These straight lines 
can be decomposed into subsets of same-slope lines each 
30 associated with a directional element. 
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By way of illustration, Figures 15a and 15b show images 
84 and 8 6 each containing one directional element, while 
Figure 15c shows an image 88 containing two directional 
elements . 

5 Figure 15al shows a plot 85 in three dimensions of the 

spectrum of the image 84 of Figure 15a. 

Figures 15bl and 15cl are Fourier modulus images 87, 8 9 
associated respectively with the images 86 and 85 of 
Figures 15b and 15c. 
10 In order to calculate the elements of the vector E, it is 

possible to adopt an approach based on projecting the image in 
different directions. The method consists initially in making 
sure that a directional component is present before estimating 
its parameters . 

15 The directional component of the image is detected on the 

basis of knowledge about its spectral properties. If the 
spectrum of the image is considered as being a three- 
dimensional image (X, Y, Z) in which (X, Y) represent the 
coordinates of the pixels and Z represents amplitude, then the 

2 0 lines that are to be detected are represented by a set of 
peaks concentrated along lines of slopes that are defined by 
the looked-for pairs (a lf P x ) (cf. Figure 15al) . In order to 
determine the presence of such lines, it suffices to count the 
predominant peaks. The number of these peaks provides 

2 5 information about the presence or absence of harmonics or 

directional supports . 

There follows a description of an example of the method 
of characterizing the directional component. To do this, 
direction pairs (a x , p x ) are calculated and the number of 

3 0 directional elements is determined. 

The method begins with calculating the discrete Fourier 
transform (DFT) of the image followed by an estimate of the 
rational slope lines observed in the transformed image i|/(i,j). 

To do this, a discrete set of projections is defined 
35 subdividing the frequency domain into different projection 
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angles 9 k/ where k is finite. This projection set can be 
obtained in various ways. For example it is possible to 
search for all pairs of mutually prime integers (a k , p k ) 

defining an angle 0 k such that e k =tan -1 — where 0<9 k <— . An 

Pk 2 

5 order r such that 0 < a k , p k < r serves to control the number 
of projections. Symmetry properties can then be used for 
obtaining all pairs up to 2n. These pairs are shown in 
Figure 16 for 0 < a k , P k < 3. 

The projections of the modulus of the DFT of the image 
10 are performed along the angle 0 k . Each projection generates a 
vector of dimension 1, V^^) , written V k to simplify the 

notation, which contains the looked- for directional 
information. 

Each projection V k is given by the formula: 

15 V k (ij)=£y(i + TP k J + Ta k ), 0<i + TP k <I-l 3 0<j + Ta k <J-1 (19) 

T 

with n = -i*p k +j*a k and 0<|n|<N k and N k = |a k |(T-l)+|p k |(L-l)+l , page 

4 0 where T*L is the size of the image. \|/(i,j) is the modulus 

of the Fourier transform of the image to be characterized. 

For each V k , the high energy elements and their positions 
20 in space are selected. These high energy elements are those 

that present a maximum value relative to a threshold that is 

calculated depending on the size of the image. 

At this stage of the calculation, the number of lines is 

known. The number of directional components Ne is deduced 
25 therefrom by using the simple spectral properties of the 

directional component of a textured image. These properties 

are as follows: 

1) The lines observed in the spectral domain of a 

directional component are symmetrical relative to the origin. 
30 Consequently, it is possible to reduce the investigation 

domain to cover only half of the domain under consideration. 
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2) The maximums retained in the vector are candidates for 
representing lines belonging to directional elements. On the 
basis of knowledge of the respective positions of the lines on 
the modulus of the discrete Fourier transform DFT, it is 
5 possible to deduce the exact number of directional elements. 
The position of the line maximum corresponds to the argument 
of the maximum of the vector V k , the other lines of the same 
element being situated every min{L,T}. 

The projection mechanism is shown in Figure 17 for (ct k , 

10 P k ) = (2, -1) . 

After processing the vectors V k and producing the 

direction pairs [a k9 j3 k ), the numbers of lines obtained with each 

pair are obtained. 

It is thus possible to count the total number of 
15 directional elements by using the two above-mentioned 

properties, and the pairs of integers {a k ,j3 k ) associated with 
these components are identified, i.e. the directions that are 
orthogonal to those that have been retained. 

For all of these pairs [a k ,f3 k ), estimating the frequencies 

2 0 of each detected element can be done immediately. If 

consideration is given solely to the points of the original 

image along the straight line of equation ia k -jf} k =c, then c 
is the position of the maximum in Vk, and these points 
constitute a harmonic one-dimensional signal (ID) of constant 
25 amplitude at a frequency v} afi) . It then suffices to estimate 
the frequency of this ID signal by a conventional method 
(locating the maximum value on the ID DFT of this new signal) . 

To summarize, it is possible to implement the method 
comprising the following steps : 

3 0 Determining the maximum of each projection. 

The maximums are filtered so as to retain only those that 
are greater than a threshold. 

• For each maximum m ± corresponding to a pair {a k ,J3 k ). 
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The number of lines associated with said pair is 
determined from the above -described properties. 

The frequency associated with {a k ,p k ) is calculated, 
corresponding to the intersection of the horizontal axis and 
the maximum line (corresponding to the maximum of the retained 
projection) . 

There follows a description of how the amplitudes {s* a, ^(0} 

and {?i a, ^(0} are calculated, which are the other parameters 
contained in the above-mentioned vector E. 

Given the direction [a k ,/3 k ) and the frequency V k , it is 
possible to determine the amplitudes ij[ a,/?) (c) and ij; a ^\c) , for c 
satisfying the formula ia k -j/3 k =c, using a demodulation method. 
$l a,fi \c) is equal to the mean of the pixels along the straight 
line of equation id k -jfi k =c of the new image that is obtained 
by multiplying y{i,j) by: 



cos 



(i/3 k +jd k ) 



This can be written as follows: 



(20) 

where N s is the number of elements in this new signal. 
Similarly, i}; a,p \c) can be obtained by applying the equation: 



(21) 



it'Kc) = -L J y(i, j)sm (/& + j& k ) 

M s ia-jp=c \CC k + P k J 

The above -described method can be summarized by the 
following steps : 

For every directional element (a k J k ), do 

For every line (d) , calculate 

1) The mean of the points (i,j) weighted by: 



cos 



- 2 . A 2 
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This mean corresponds to the estimated amplitude s k a '^(d) 
2) The mean of the points (i,j) weighted by: 

( tl'J) 



sin 



This mean corresponds to the estimated amplitude t^'^(d) . 

Table 3 below summarizes the main steps in the projection 
method. 



Step 1. Calculate the set of projection pairs (ot k , P k ) e 

Step 2 . Calculate the modulus of the DFT of the image 
y(i,j): w(co,v) = |DFT(y(i / j))| 



Step 3. For every (ot k/ p k ) e P r calculate the vector V k : 
the projection of i}/(w,v) along (a k/ P k ) using equation 
(19) . 



Step 4: Detecting lines: 
For every (a k/ p k ) e P r 

• determine : M k = max{v k (j)} ; 

• calculate n k/ the number of pixels of significant 
value encountered along the projection 

• save n k and j max the index of the maximum in V k 

• select the directions that satisfy the criterion: 

M k 

> S e 

where s e is a threshold to be defined, depending on the 
size of the image. 

The directions that are retained are considered as being 
the directions of the looked- for lines. 

Step 5 . Save the looked- for pairs [d k ,j3 k ) which are the 

orthogonals of the pairs (a k/ p k ) retained in step 4. 
Table 3 
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There follows a description of detecting and 
characterizing periodic textural information in an image, as 
contained in the harmonic component {h(i,j)}. This component 
can be represented as a finite sum of 2D sinewaves: 

5 h(i, j) = £ C p cos 2n(i(o p +jv p )+D p sin 2n(ia> p +jv p ), (22) 
P =\ 

where : 

• c p and D p are amplitudes; 

" ^p' v p^ is the pth s P atial frequency. 

Figure 18al shows an image 91 containing periodic 
10 components, and Figure 18bl is a synthesized image containing 
one periodic component . 

Figure 18a2 shows an image 92 which is an image of the 
modulus of the DFT presenting a set of peaks. 

Figure 18b2 is a 3D view 94 of the DFT which shows the 
15 presence of a symmetrical pair of peaks 95, 96. 

In the spectral domain, the harmonic component thus 
appears as a pair of isolated peaks that are symmetrical about 
the origin (cf. Figure 18 (a2) - (b2) ) . This component reflects 
the existence of periodicities in the image. 

2 0 The information that is to be determined is constituted 

by the elements of the vector: 

H = \p,{C p ,D p ,a> p ,v p Y p J (23) 

For this purpose, the procedure begins by detecting the 
presence of said periodic component in the image of the 
25 modulus of the Fourier transform, after which its parameters 
are estimated. 

Detecting the periodic component consists in determining 
the presence of isolated peaks in the image of the modulus of 
the DFT. The procedure is the same as when determining the 

3 0 directional components. From the method described in Table 1, 

if the value n k obtained during stage 4 of the method described 
in Table 1 is less than a threshold, then isolated peaks are 
present that characterize the presence of a harmonic 
component, rather than peaks that form a continuous line. 
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Characterizing the periodic component amounts to locating 
the isolated peaks in the image of the modulus of the DFT . 

These spatial frequencies (<*> p >v p ) correspond to the 
positions of said peaks: 
5 {cb p , ) = arg max ^(co, v) (24) 

(o),v) 

In order to calculate the amplitudes (c p ,D p ) a 

demodulation method is used as for estimating the amplitudes 
of the directional component. 

For each periodic element of frequency [d> p9 v p ) f the 
10 corresponding amplitude is identical to the mean of the pixels 
of the new image obtained by multiplying the image {y{i,j)} by 
cos{id> p + jv p ) . This is represented by the following equations: 

C p =— — ^^>/(«,/w)cos(«^+mvJ (25) 

LY> I „=o m =0 

15 To sum up, a method of estimating the periodic component 

comprises the following steps: 



Step 1. Locate the isolated peaks in the second half of 
the image of the modulus of the Fourier transform and 
count the number of peaks . 



Step 2. For each detected peak: 

• calculate its frequency using equation (24) ; 

• calculate its amplitude using equations (25-26) . 



The last information to be extracted is contained in the 
20 purely random component {w(i,j)}. This component may be 
represented by a 2D autoregressive model of the non- 
symmetrical half-plane support (NSHP) defined by the following 
difference equation : 

A'>J)=- Z^M^O-^^'-O+^y) (27) 
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where {a (k l) } (k l)eS are the parameters to be determined for every 
(k, 1) belong to: \ 
s n, m ={(k,l)/k = 0, l<l<M}^{(k,l)/ l<k<N, -M<1<m} 
The pair (N, M) is known as the order of the model 
5 • {u(i,j)} is Gaussian white noise of finite variance a] . 

The parameters of the model are given by: 

W = \N,MW u A° k j\ w J (28) 

The methods of estimating the elements of W are numerous, 
such as for example the 2D Levinson algorithm for adaptive 
10 methods of the least squares type (LS) . 

There follows a description of a method of characterizing 
the color of an image from which it is desired to extract 
terms t t representing iconic characteristics of the image, 
where color is a particular example of characteristics that 
15 can comprise other characteristics such as algebraic or 
geometrical moments, statistical properties, or the spectral 
properties of pseudo-Zernicke moments. 

The method is based on perceptual characterization of 
color. Firstly, the color components of the image are 

2 0 transformed from red, green, blue (RGB) space to hue, 

saturation, value (HSV) space. This produces three 

components: hue, saturation, value. On the basis of these 
three components, N colors or iconic components of the image 
are determined. Each iconic component Ci is represented by a 
25 vector of M values. These values represent the angular and 
annular distribution of points representing each component, 
and also the number of points of the component in question. 

The method developed is shown in Figure 19 using, by way 
of example, N = 16 and M = 17 . 

3 0 In a first main step 110, starting from an image 11 in 

RGB space, the image 111 is transformed from RGB space into 
HSV space (step 112) in order to obtain an image in HSV space. 
The HSV model can be defined as follows. 

Hue (H) : varies over the range [0 360] , where each angle 
35 represents a hue. 
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Saturation (S) ; varies over the range [0 1] , measuring 
the purity of colors, thus serving to distinguish between 
colors that are "vivid", "pastel", or "faded". 

Value (V) : takes values in the range [0 1] , indicates the 
5 lightness or darkness of a color and the extent to which it is 
close to white or black. 

The HSV model is a non- linear transformation of the RGB 
model. The human eye can distinguish 12 8 hues, 13 0 

saturations, and 23 shades. 
10 For white, V = 1 and S = 0, black has a value V = 0, and 

hue and saturation H and S are undetermined. When V = 1 and S 
= 1, then the color is pure. 

Each color is obtained by adding black or white to the 
pure color. 

15 In order to have colors that are lighter, S is reduced 

while maintaining H and V, and in contrast in order to have 
colors that are darker, black is added by reducing V while 
leaving H and S unchanged. 

Going from the color image expressed in RGB coordinates 
2 0 to an image expressed in HSV space, is performed as follows: 

For every point of coordinates (i,j) and of value (R k , G k , 
B k ) produce a point of coordinates (i,j) and of value (H k , S k , 
V k ) , with: 
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= max (R k ,B k ,G k ) 



Sv = 



V k -min (R k/ G k/ B k ) 



10 



15 



20 



25 



30 



V -min (R k ,G k ,B k ) 



< 



H k = 



2 + - 
4 + - 



B k ~ R k 



V k - min (R k ,G k ,B k ) 



R k ~ G k 



V k -min(R k ,G k ,B k ) 



if V k is equal to R k 



if V k is equal to G k 



if V k is equal to 



Thereafter, the HSV space is partitioned (step 113) . 

N colors are defined from the values given to hue, 
saturation, and value. When N equals 16, then the colors are 
as follows: black, white, pale gray, dark gray, medium gray, 
red, pink, orange, brown, olive, yellow, green, sky blue, blue 
green , blue , purple , magenta . 

For each pixel, the color to which it belongs is 
determined. Thereafter, the number of points having each 
color is calculated. 

In a second main step 120, the partitions obtained during 
the first main step 110 are characterized. 

In this step 120, an attempt is made to characterize each 
previously obtained partition Ci . A partition is defined by 
its iconic component and by the coordinates of the pixels that 
make it up. The description of a partition is based on 
characterizing the spatial distribution of its pixels (cloud 
of points) . The method begins by calculating the center of 
gravity, the major axis of the cloud of points, and the axis 
perpendicular thereto. This new index is used as a reference 
in decomposing the partition Ci into a plurality of sub- 
partitions that are represented by the percentage of points 
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making up each of the sub-partitions. The process of 
characterizing a partition Ci is as follows: 

- calculating the center of gravity and the orientation 
angle of the components Ci defining the partitioning index; 
5 • calculating the angular distribution of the points of 

the partition Ci in the N directions operating 
counterclockwise, in N sub-partitions defined as follows: 

360 2x360 ix360 (N-l)x360 

N N N N 

• partitioning the image space into squares of concentric 
10 radii, and calculating on each radius the number of points 
corresponding to each iconic component . 

The characteristic vector is obtained from the number of 
points of each distribution of color Ci , the number of points 
in the 8 angular sub-distributions, and the number of image 
15 points. 

Thus, the characteristic vector is represented by 17 
values in this example . 

Figure 19 shows the second step 120 of processing on the 
basis of iconic components CO to C15 showing for the 
20 components CO (module 121) and C15 (module 131) , the various 
steps undertaken, i.e. angular partitioning 122, 132 leading 
to a number of points in the eight orientations under 
consideration (step 123, 133), and annular partitioning 124, 
134 leading to a number of points on the eight radii under 
25 consideration (step 125, 135), and also taking account of the 
number of pixels of the component (CO or C15 as appropriate) 
in the image (step 12 6 or step 136) . 

Steps 123, 125, and 126 produce 17 values for the 
component CO (step 127) and steps 133, 135, and 136 produce 17 
3 0 values for the component C15 (step 137) . 

Naturally, the process is analogous for the other 
components CI to C14 . 

Figures 20 and 21 show the fact that the above-described 
process is invariant in rotation. 
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Thus, in the example of Figure 20, the image is 
partitioned in two subsets, one containing crosses X and the 
other circles O. After calculating the center of gravity and 
the orientation angle 0, an orientation index is obtained that 
5 enables four angular sub-divisions (0°, 90° , 180°, 270°) to be 
obtained. 

Thereafter, an annular distribution is performed, with 
the numbers of points on a radius equal to 1 and then on a 
radius equal to 2 being calculated. This produces the vector 
10 V0 characteristic of the image of Figure 20: 19; 6; 5; 4; 4; 
8; 11. 

The image of Figure 21 is obtained by turning the image 
of Figure 20 through 90°. By applying the above method to the 
image of Figure 21, a vector VI is obtained characterizing the 
15 image and demonstrating that the rotation has no influence on 
the characteristic vector. This makes it possible to conclude 
that the method is invariant in rotation. 

As mentioned above, methods making it possible to obtain 
for each image the terms representing the dominant colors, the 
20 textural properties, or the structures of the dominant zones 
of the image, can be applied equally well to the entire image 
or to portions of the image. 

There follows a brief description of the process whereby 
a document can be segmented in order to produce image portions 
25 for characterizing. 

In a first possible technique, static decomposition is 
performed. The image is decomposed into blocks with or 
without overlapping . 

In a second possible technique, dynamic decomposition is 
3 0 performed. Under such circumstances, the image is decomposed 
into portions as a function of the content of the image. 

In a first example of the dynamic decomposition 
technique, the portions are produced from germs constituted by 
singularity points in the image (points of inflection) . The 
3 5 germs are calculated initially, and they are subsequently 
fused so that only a small number remain, and finally the 
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image points are fused with the germs having the same visual 
properties (statistics) in order to produce the portions or 
the segments of the image to be characterized. 

In another technique that relies on hierarchical 
5 segmentation, the image points are fused to form n first 
classes. Thereafter, the points of each of the classes are 
decomposed into m classes and so on until the desired number 
of classes is reached. During fusion, points are allocated to 
the nearest class. A class is represented by its center of 
10 gravity and/or a boundary (a surrounding box, a segment, a 
curve, ...) . 

The main steps of a method of characterizing the shapes 
of an image are described below. 

Shape characterization is performed in a plurality of 
15 steps : 

To eliminate a zoom effect or variation due to movement 
of non-rigid elements in an image (movement of lips, leaves on 
a tree, ...) , the image is subjected to multiresolution followed 
by decimation. 

20 To reduce the effect of shifting in translation, the 

image or image portion is represented by its Fourier 
transform. 

To reduce the zoom effect, the image is defined in polar 
logarithmic space. 
25 The following steps can be implemented: 

a) multiresolution f = wavelet ( I , n) ; where I is the 
starting image and n is the number of decompositions; 

b) projection of the image into logPolar space: 
g(l,m) = f(i,j) with i = l*cos (m) and j = l*sin(m); 

30 c) calculating the Fourier transform of g: H = FFT(g); 

d) characterizing H; 

dl) projecting H in a plurality of directions (0, 
45, 90, ...) : the result is a set of vectors of dimension equal 
to the dimension of the projection segment; 
35 d2) calculating the statistical properties of each 

projection vector (mean, variance, moments) . 
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The term representing shape is constituted by the values 
of the statistical properties of each projection vector. 
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